5,049 research outputs found

    Cram\'er type moderate deviation theorems for self-normalized processes

    Full text link
    Cram\'er type moderate deviation theorems quantify the accuracy of the relative error of the normal approximation and provide theoretical justifications for many commonly used methods in statistics. In this paper, we develop a new randomized concentration inequality and establish a Cram\'er type moderate deviation theorem for general self-normalized processes which include many well-known Studentized nonlinear statistics. In particular, a sharp moderate deviation theorem under optimal moment conditions is established for Studentized UU-statistics.Comment: Published at http://dx.doi.org/10.3150/15-BEJ719 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Cram\'{e}r-type moderate deviations for Studentized two-sample UU-statistics with applications

    Full text link
    Two-sample UU-statistics are widely used in a broad range of applications, including those in the fields of biostatistics and econometrics. In this paper, we establish sharp Cram\'{e}r-type moderate deviation theorems for Studentized two-sample UU-statistics in a general framework, including the two-sample tt-statistic and Studentized Mann-Whitney test statistic as prototypical examples. In particular, a refined moderate deviation theorem with second-order accuracy is established for the two-sample tt-statistic. These results extend the applicability of the existing statistical methodologies from the one-sample tt-statistic to more general nonlinear statistics. Applications to two-sample large-scale multiple testing problems with false discovery rate control and the regularized bootstrap method are also discussed.Comment: Published at http://dx.doi.org/10.1214/15-AOS1375 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

    Get PDF
    Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable YY with the best ss linear combinations of pp covariates X\mathbf{X}, even when X\mathbf{X} and YY are independent. When the covariance matrix of X\mathbf{X} possesses the restricted eigenvalue property, we derive such distributions for both a finite ss and a diverging ss, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of X\mathbf{X}. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis
    • …
    corecore